By all these lovely tokens... Merging Conflicting Tokenizations
نویسندگان
چکیده
Given the contemporary trend to modular NLP architectures and multiple annotation frameworks, the existence of concurrent tokenizations of the same text represents a pervasive problem in everyday’s NLP practice and poses a non-trivial theoretical problem to the integration of linguistic annotations and their interpretability in general. This paper describes a solution for integrating different tokenizations using a standoff XML format, and discusses the consequences for the handling of queries on annotated corpora.
منابع مشابه
Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features
This paper presents a generalized i-vector framework with phonetic tokenizations and tandem features for speaker verification as well as language identification. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained Gaussian Mixture Models (GMM) components to phonetic phonemes, 3-grams and tandem feature trained GMM components using phoneme posterior prob...
متن کاملOne Tokenization per Source
We report in this paper the observation of one tokenization per source. That is, the same critical fragment in different sentences from the same source almost always realize one and the same of its many possible tokenizations. This observation is demonstrated very helpful in sentence tokenization practice, and is argued to be with far-reaching implications in natural language processing. 1 I n ...
متن کاملWatershed-based region merging using conflicting regions
In this paper, we present a method of watershed-based region merging using ‘conflicting regions’ for segmentation of gray level images. It is obvious that both regions and edges in an image give important clues to segmentation in our visual system. So our method uses information from both regions and edges properly. We first obtain initial segments by applying watershed transformation to the im...
متن کاملMerging Beliefs with Goals A formal analysis based on Merging Operators
The specification and verification of cognitive agent systems leads to the integration of formal systems, such as the combination of logics of beliefs, goals and intentions in Rao and Georgeff’s BDICTL logic, the merging of conflicting informational attitudes such as knowledge bases or belief sources into beliefs, and the merging of conflicting motivational attitudes such as desires, obligation...
متن کاملAn Argumentation Framework for Merging Conflicting Knowledge Bases
The problem of merging multiple sources of information is central in many information processing areas such as databases integrating problems, multiple criteria decision making, etc. Recently several approaches have been proposed to merge classical propositional bases. These approaches are in general semantically defined. They use priorities, generally based on Dalal’s distance for merging clas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009